SOFTWARE: ASP.NET | VB.NET | C#.NET | RAZOR MVC 4 ASP.NET | RESTful Web services
Rail accidents represent an important safety concern for the transportation industry in many countries. These accident cause lot of damages, deaths of crew and passengers, and many injuries, to improve security, the FRA has required the railroads to submit reports that contain that describe the characteristics of the accident with fixed field entries and narratives. While a number of studies have looked at the fixed fields, none have done an extensive analysis of the narratives. This project describes the use of text mining with a combination of techniques to automatically discover accident characteristics that can inform a better understanding of the contributors to the accidents. The results show that predictive accuracy for accident costs significantly improves through the use of features found by text mining and predictive accuracy
Information extraction is an initial step for unstructured text analysing. Simplification of text is the work of information extraction. The main work is to recognize phrases and finds the relation between them. It is suitable for the bulky size of text. It extracts structured information from unstructured information.
Clustering focus towards the similarity measures on different objects and places, it has no predefined class labels. It segregate text into one group and in the same way generates cluster of group. Words are isolated quickly and weights are assigned to each word. List of classes are generated by using clustering algorithms after calculating similarities.
In this module, we integrate methods for safety analysis with accident report data and text mining to uncover contributors to rail accidents. This section describes related work in rail and, more generally, transportation safety and also introduces the relevant data and text mining techniques.
This report has a number of fields that include characteristics of the train or trains, the personnel on the trains operational conditions (e.g., speed at the time of accident, highest speed before the accident, number of cars, and weight), and the primary cause of the accident. This field has become increasingly important because of the large amounts of data available in documents, news articles, research papers, and accident reports.
Text databases are semi structured because in addition to the free text they also contain structured fields that have the titles, authors, dates, and other Meta data. The accident reports used in this paper are semi structured.
Rail accident’s are increasing from past 11 years. These accident cause damages, deaths of crew and passengers, and many injuries. Here, we need to understand the characteristics of these accidents to improve safety measures. The FRA prepare a report about these accidents, which include characteristics of the train or trains, the environmental conditions, operational conditions, etc. The “develop hazard elimination and risk reduction programs that focus on preventing railroad injuries and accidents”. However, as with many safety and regulatory agencies, they can effectively perform analyses on aggregate trends.
In our proposed work, we investigate the possible predictors or contributors to accidents obtained from “mining” the narrative text in rail accident reports. To do this, first identify the accidents of interest and then look for relationships in the structured and unstructured data. Then evaluates the efficacy of the features found from text mining. Finally, the study teases apart the text-mined features, whose importance is confirmed by predictive accuracy.
In this project, show that the combination of text analysis with ensemble methods can improve the accuracy of models for predicting accident severity and that text analysis can provide insights into accident characteristics. Modern text analysis methods make the narratives in the accident reports almost as accessible for detailed analysis as the fixed fields in the reports. More importantly as the examples illustrated, text mining of the narratives can provide a much richer amount of information than is possible in the fixed fields. Finally, as described in the work here used standard methods to clean the narratives. However, train accident narratives use jargon common to the rail transport industry and classical stemming and stop word removal do not necessarily do a good job of characterizing the words used in this industry. For train safety analysis, text mining could benefit from a careful look at ways to extract features from text that takes advantage of language characteristics particular to the rail transport industry.